2022-05-10

Article introduction

Title: “Peripheral Blood Mitochondrial DNA Copy Number Is Associated with Prostate Cancer Risk and Tumor Burden”

Authors: Weimin Zhou, Min Zhu, Ming Gui, Lihua Huang, Zhi Long, Li Wang, Hui Chen, Yinghao Yin, Xianzhen Jiang, Yingbo Dai, Yuxin Tang, Leye He, Kuangbiao Zhong

Goal: Determine if mtDNA is a predictor for prostate cancer

Flowchart for project flow

Data set overview

Loading

  • Dimensions of the raw data set: 392, 13

  • Stratified on Controls and PCa cases (attribute called Group)

  • Purpose of article: Predict PCa from other variables, mainly mtDNA

Cleaning

  • Check for duplicates

  • Filter for PCRsuccess

  • New dimensions: 387, 13

Augmenting

  • BMI- and DFI-classifier

  • New columns based on TNM-notation

  • Add “Group” as strings

  • <<<<<<< HEAD
  • New dimensions: 387, 18

Boxplot with continuous variables, any outliers?

Boxplot with discrete variables, any outliers?

======= <<<<<<< HEAD
  • New dimensions: 387, 18

  • =======
  • New dimensions: 387, 18

  • Boxplot with continuous variables, any outliers?

    <<<<<<< HEAD

    Boxplot with discrete variables, any outliers?

    ======= <<<<<<< HEAD

    Boxplot with discrete variables, any outliers?

    >>>>>>> 5c132661f3ce50ad22e15ccedbf3bbca06191a36

    Boxplot with continuous variables, any outliers?

    Boxplot with discrete variables, any outliers?

    <<<<<<< HEAD

    =======

    >>>>>>> b8970bfd735106aa622bf0309945e577486154b7 >>>>>>> cdf0bf767d630e4783ca812299e80125b179d1fe >>>>>>> 7cb7aad3ebad3fd5de205b151b4e94ad41ba2fb6 >>>>>>> 5c132661f3ce50ad22e15ccedbf3bbca06191a36 >>>>>>> 8225813b8c493bc2895d963176a3b92fa969b6d8

    Re-creating plot from the article

    Article visualizationArticle visualization

    Article visualization

    A better biomarker for PCa?

    <<<<<<< HEAD

    ======= <<<<<<< HEAD

    =======

    >>>>>>> 5c132661f3ce50ad22e15ccedbf3bbca06191a36 >>>>>>> 8225813b8c493bc2895d963176a3b92fa969b6d8

    Logistic regression, excl. PSA

    Significant p-values:
    Maybe the distribution of Dfi-classes are skewed?

    Logistic regression, incl. PSA

    Significant p-values:

    Principal component analysis (PCA)

    PCAPCAPCA

    PCA

    Interesting finding during exploratory data analysis

    Some more data exploration

    Conclusion

    • We can support the conclusion of the article, mtDNA is a biomarker for PCa (e.g, it is reproducible)
    • PSA levels seem to be an even better biomarker
    • Both of the above could be supported by logistic regression
    • Conclusion for PCA?
    • Some further research? Should be possible to do classification on Gleason scores/AJCC, should also be possible to do regression (albeit out of the scope of this course)